Natural Language Information Retrieval

نویسنده

  • Tomek Strzalkowski
چکیده

Information Retrieval (IR) in this collection of 14 original papers is broadly construed to include document retrieval, information extraction, question answering, clustering and classification. In the introduction Strzalkowski asks the provocative question Why hasn't NLP had more success in IR?, the answer to which ought to be of interest to readers of Computational Linguistics. Unfortunately, the majority of the papers in the book completely fail to address this question, and several of the papers do not discuss applications of NLP to IR at all. A brief summary of the papers demonstrates the range of topics covered. Karen Sparck Jones, in "What is the role of NLP in text retrieval?," gives an overview of linguistically motivated indexing (LMI) and nonlinguistic indexing (NLI). LMI is not suitable for queries with few words, yet as more words are added, the conjunction of search terms benefits NLI, raising the bar against which LMI is being compared. Sparck Jones concludes that LMI might still be useful for displaying informative information about documents. Christian Jacquemin and Evelyne Tzoukermann, in "NLP for term variant extraction: Synergy between morphology, lexicon, and syntax," perform phrase normalization on the basis of full morphological analysis and patterns over parts of speech and syntactic constituents. They provide an overview of finite-state automata for morphological analysis and rule ordering for derivational affixation in French, with a tangential section on Spanish. Gerda Ruge, in "Combining corpus linguistics and human memory models for automatic term association," draws on psycholinguistic research to improve models of spreading activation within a semantic network sensitive to head/modifier relationships. Alan F. Smeaton, in "Using NLP or NLP resources for information retrieval tasks," after experiments with matching entire syntactic analyses for TREC yielded results that were much worse than traditional tf.idf measures, has experimented with selectively using NLP resources such as WordNet. Retrieval of picture captions with manual word sense disambiguation outperforms a tf.idf baseline. Tomek Strzalkowski, Fang Lin, Jin Wang, and Jose Perez-Carballo, in "Evaluating natural language processing techniques for information retrieval: A TREC perspective," search for appropriate ways to weight linguistic and nonlinguistic representations of document content, and explore expansions of the query based on selecting entire paragraphs. Manually selected paragraphs yield substantial gains in precision.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Language Models and Information Retrieval: natural language processing really meets retrieval

Traditionally, natural language processing techniques for information retrieval have always been studied outside the framework of formal models of information retrieval. In this article, we introduce a new formal model of information retrieval based on the application of statistical language models. Simple natural language processing techniques that are often used for information retrieval – we...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Applying Light Natural Language Processing to Ad-Hoc Cross Language Information Retrieval

In the CLEF 2005 Ad-Hoc Track we experimented with language-specific morphosyntactic processing and light Natural Language Processing (NLP) for the retrieval of Bulgarian, French, Italian, English and Greek.

متن کامل

A natural language interface to a graph-based bibliographic information retrieval system

With the ever-increasing scientific literature, there is a need on a natural language interface to bibliographic information retrieval systems to retrieve related information effectively. In this paper, we propose a natural language interface, NLI-GIBIR, to a graph-based bibliographic information retrieval system. In designing NLI-GIBIR, we developed a novel framework that can be applicable to ...

متن کامل

Statistical Identification of Collocations in Large Corpora for Information Retrieval

The linguistic phenomenon of collocation, the habitual juxtaposition of some words in natural language has been shown to benefit natural language processing tasks such as information retrieval. This paper examines the utility of several methods for collocation extraction for document retrieval, specifically for queries in question form.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 31  شماره 

صفحات  -

تاریخ انتشار 1995